Assignment VIII: Deep Learning

Question 1

Use the dataset, DEMO_DATA/chinese_name_gender.txt and create a Chinese name gender classifier using the deep learning method. You need to include a few important considerations in the creation of the deep learning classifer.

  1. Please consult the lecture notes and experiment with different architectures of neural networks. In particular, please try combinations of the following types of network layers:

    • dense layer

    • embedding layer

    • RNN layer

    • bidirectional layer

  2. Please include regularizations and dropbouts to avoid the issue of overfitting.

  3. Please demonstrate how you find the optimal hyperparameters for the neural network using keras-tuner.

  4. Please perform post-hoc analyses on a few cases using LIME for more interpretive results.

Prepare Data

Train-Test Split

Tokenizer

  • By default, the token index 0 is reserved for padding token.

  • If oov_token is specified, it is default to index 1.

  • Specify num_words for tokenizer to include only top N words in the model

  • Tokenizer will automatically remove puntuations.

  • Tokenizer use whitespace as word delimiter.

  • If every character is treated as a token, specify char_level=True.

Prepare Input and Output Tensors

  • Like in feature-based machine translation, a computational model only accepts numeric values. It is necessary to convert raw text to numeric tensor for neural network.

  • After we create the Tokenizer, we use the Tokenizer to perform text vectorization, i.e., converting texts into tensors.

  • In deep learning, words or characters are automatically converted into numeric representations.

  • In other words, the feature engineering step is fully automatic.

Two Ways of Text Vectorization

  • Texts to Sequences: Integer encoding of tokens in texts and learn token embeddings

  • Texts to Matrix: One-hot encoding of texts (similar to bag-of-words model)

Method 1: Text to Sequences

From Texts and Sequences

  • Text to Sequences

  • Padding to uniform lengths for each text

Vocabulary

Padding

  • When padding the all texts into uniform lengths, consider whether to Pre-padding or removing values from the beginning of the sequence (i.e., pre) or the other way (post).

  • Check padding and truncating parameters in pad_sequences

Define X and Y

Method 2: Text to Matrix

One-Hot Encoding

  • Text to Matrix (to create bag-of-word representation of each text)

  • Choose modes: binary, count, or tfidf

  • names_matrix in fact is a bag-of-characters representation of a name text.

Define X and Y

Model Definition

  • After we have defined our input and output tensors (X and y), we can define the architecture of our neural network model.

  • For the two ways of name vectorized representations, we try two different network structures.

    • Text to Sequences: Embedding + RNN

    • Text to Matrix: Fully connected Dense Layers

Model 1: Fully Connected Dense Layers

  • Two fully-connected dense layers with the Text-to-Matrix inputs

plot_model(model1, show_shapes=True)
../_images/8-dl-chinese-name-gender-ans_48_0.png

A few hyperparameters for network training

  • Batch size

  • Epoch

  • Validation Split Ratio

Model 2: Embedding + RNN

  • One Embedding Layer + One RNN Layer

  • With Text-to-Sequence inputs

plot_model(model2, show_shapes=True)
../_images/8-dl-chinese-name-gender-ans_59_0.png

Model 3: Regularization and Dropout

  • Previous two examples clearly show overfitting of the models because the model performance on the validation set starts to stall after the first few epochs.

  • We can implement regularization and dropouts in our network definition to avoid overfitting.

plot_model(model3)
../_images/8-dl-chinese-name-gender-ans_66_0.png

Model 4: Improve the Models

  • In addition to regularization and dropouts, we can further improve the model by increasing the model complexity.

  • In particular, we can increase the depths and widths of the network layers.

  • Let’s try stack two RNN layers.

plot_model(model4)
../_images/8-dl-chinese-name-gender-ans_73_0.png

Model 5: Bidirectional

  • Now let’s try the more sophisticated RNN, LSTM, and with birectional computing.

  • And add more nodes to the LSTM layer.

plot_model(model5)
../_images/8-dl-chinese-name-gender-ans_80_0.png

Check Embeddings

  • Compared to one-hot encodings of characters, embeddings may include more information relating to the characteristics of the characters.

  • We can extract the embedding layer and apply dimensional reduction techniques (i.e., TSNE) to see how embeddings capture the relationships in-between characters.

Hyperparameter Tuning

Note

Please install keras tuner module in your current conda:

pip install -U keras-tuner
  • Like feature-based ML methods, neural networks also come with many hyperparameters, which require default values.

  • Typical hyperparameters include:

    • Number of nodes for the layer

    • Learning Rates

  • We can utilize the module, keras-tuner, to fine-tune the hyperparameters.

  • Steps for Keras Tuner

    • First, wrap the model definition in a function, which takes a single hp argument.

    • Inside this function, replace any value we want to tune with a call to hyperparameter sampling methods, e.g. hp.Int() or hp.Choice(). The function should return a compiled model.

    • Next, instantiate a tuner object specifying your optimization objective and other search parameters.

    • Finally, start the search with the search() method, which takes the same arguments as Model.fit() in keras.

    • When search is over, we can retrieve the best model and a summary of the results from the tunner.

  • The max_trials variable represents the number of hyperparameter combinations that will be tested by the tuner.

  • The execution_per_trial variable is the number of models that should be built and fit for each trial for robustness purposes.

Explanation

Train Model with the Tuned Hyperparameters

Interpret the Model

from lime.lime_text import LimeTextExplainer

explainer = LimeTextExplainer(class_names=['Male'], char_level=True)
exp = explainer.explain_instance(
X_test_texts[text_id], model_predict_pipeline, num_features=100, top_labels=1)
exp.show_in_notebook(text=True)
exp = explainer.explain_instance(
'陳宥欣', model_predict_pipeline, num_features=100, top_labels=1)
exp.show_in_notebook(text=True)
exp = explainer.explain_instance(
'李安芬', model_predict_pipeline, num_features=2, top_labels=1)
exp.show_in_notebook(text=True)
exp = explainer.explain_instance(
'林月名', model_predict_pipeline, num_features=2, top_labels=1)
exp.show_in_notebook(text=True)
exp = explainer.explain_instance(
'蔡英文', model_predict_pipeline, num_features=2, top_labels=1)
exp.show_in_notebook(text=True)